J 



® 



Eur paisches Patentamt 
Eur pean Patent Offic 
Office ur p£en des brev ts 



■IlllillllDHl 

0 Publication number: 0 656 582 A1 



EUROPEAN PATENT APPLICATION 



© Application number 94112045.3 
@ Date of filing: 02.0&94 



© int CI ": G06F 7/48, G06F 7/50, 
G06F 7/544 



® Priority: 29-11.93 US 158649 


@ Inventor: Lee, Ruby Bei-Loh 
12933 Atherton Court 


® Date of publication of application: 
07.06.95 Bulletin 95/23 


Los Altos Hills, CA 94022 (US) 
Inventor: Beck, John Paul 
7 Florence Way 


© Designated Contracting States: 


Tyngsborough, MA 01879 (US) 


DEFRGB 




© Applicant: Hewlett-Packard Company 


© Representative: Uesegang, Roland, Dr.-lng. et 
al 


3000 Hanover Street 


FORRESTER & BOEHMERT 


Palo Alto, 


Franz-Joseph-Strasse 38 


California 94304 (US) 


D-80801 MUnchen (DE) 



© Parallel adding and averaging circuit and method. 



CM 
00 

in 

CO 

in 

CD 



Q. 
UJ 



© An apparatus[10, 30, 100] that can also be used 
for generating the average of two integers. The 
apparatus[10, 30, 100] can be divided into a plurality 
of sub-adders[l02] that operate on sub-words of the 
input integers in parallel. Hence, the adder can be 
used for adding or subtracting one set of two in- 
tegers wherein each integer is of some predeter- 
mined length or a plurality of sets of two integers 



provided the sum of the lengths of the integers is 
less than or equal to this predetermined length. The 
apparatus[10, 30, 100] can also generate the sum, or 
difference, of each of the sub-words divided by two. 
The parallel operations can be carried out in re- 
sponse to a single instruction. The results of the 
division by two are rounded in a manner that elimi- 
nates biasing of the results. 
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Field of the Invention 

The present inv ntion r lat s to comput rs, 
and mor particularly, t arithm tic units for use 
therein. 

Background of the Invention 

Computers normally include an arithmetic logic 
unit that includes an adder that adds numbers of 
some maximum number of bits. Adders for words 
of length 32 and 64 bits are common in micropro- 
cessors and the like. While these adders will also 
operate on much smaller words, when doing so, 
the majority of the logic circuits contained in the 
adders are idle. For example, a 64-bit adder can be 
used to add two 8-bit words by placing each of the 
8-bit words in the least significant portion of a 
corresponding 64-bit word and then adding the 64- 
bit words. During the addition, the logic circuitry 
concerned with adding the 7 high order bytes of 
each of the words is effectively idle. Hence, 7/8 lhs 
of the capacity of the adder is being wasted during 
this operation. 

Computations involving the addition of a large 
number of pairs of small words are often encoun- 
tered in multi-media data processing. For example, 
consider the problem of adding two gray-scale 
images to generate a sum image. The pixels of the 
images are typically stored as one byte integers 
representing the light intensity at a corresponding 
point in the image. Since storage space is always 
at a premium, the pixels of the image are typically 
packed into words. If the basic word size on the 
computer is 32-bits, the pixels could be packed 
four per word. Each image may have a million 
pixels. Hence, the computation of the sum image 
involves adding two pixels from the component 
image to generate a corresponding pixel in the 
sum image. If there are a million pixels in each 
image, a million such additions must be performed. 
Each addition requires the "unpacking n of two 
words, one from each component image, the addi- 
tion of the two unpacked bytes, and then the stor- 
age of the result in the correct byte of a word in 
the sum image. 

If the basic word size of the computer is 32- 
bits, the computer will normally have a 32-bit ad- 
der. During these computations, 75% of the adding 
capacity of the adder will be idle. Hence, a conven- 
tional arithmetic logic unit is not being used opti- 
mally when performing this type of image calcula- 
tion. 

Th computation tim s n count red in these 
types of operations can be xcessiv . H nee, spe- 
cial parallel comput r architectures are often m- 
ployed to reduce th time between th xecution f 
the sum image command and the tim at which the 



sum image is completed. Since all of the additions 
are independent of each other, the adds can be 
performed in parall I without regard to rdering. A 
comput r with M add rs can, in principl , pr vid a 

5 result in 1/M 1h the time provided the movement of 
the pixels from memory to the adders does not 
become a bottleneck. Hence, it would be advanta- 
geous to provide a computer architecture in which 
multiple additions can be performed in parallel. 

w Unfortunately, the cost of providing these additional 
adders and the hardware needed to control them is 
often prohibitive. 

It should also be noted that the image addition 
problem discussed above often includes generating 

75 an average image. An image having pixels that are 
the average of the corresponding pixels in the 
component images is equivalent to generating the 
pixels of the sum image discussed above and then 
dividing the intensity of each of the sum image 

20 pixels by 2. This type of image computation is 
preferred since it prevents overflows, rf the two 
corresponding component image pixel values have 
values greater than 128, then the sum image pixel 
cannot be represented as an 8-bit integer. To avoid 

25 this problem, the average image is used, since the 
average image pixels will always be representable 
as one byte integers if the component image pixels 
were one byte integers. 

Broadly, it is the object of the present invention 

30 to provide an improved adder. 

It is a further object of the present invention to 
provide an adder that operates at high efficiency 
when multiple additions involving words that are 
smaller than the width of the adder are added. 

35 It is yet another object of the present invention 
to provide an adder that can generate the average 
of two numbers in a single machine cycle. 

It is a still further object of the present inven- 
tion to provide an adder that is adapted for com- 

40 puting a plurality of average values of pairs of 
numbers that are smaller than the width of the 
adder. 

These and other objects of the present inven- 
tion will become apparent to those skilled in the art 
45 from the following detailed description of the inven- 
tion and the accompanying drawings. 

Summary of the Invention 

so The present invention is an arithmetic logic unit 
that can also be used for generating the average of 
two integers. The arithmetic logic unit includes an 
adder that can be divided into a plurality of sub- 
adders that perat on sub-words of th input 

55 integ rs in parall I. H nee, the add r can be used 
for adding one set f tw integ rs wh rein each 
integ r is of som pred t rmined I ngth or a plu- 
rality f s ts f two integ rs provided th sum f 
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th lengths f th integ rs is less than or equal to 
this predetermined I ngth. Th adder is construct- 
ed from a plurality of add r stag s connected in an 
rdered sequence. Each add r stage operates on 
one or more bits to generate sum bits and a carry 
output bit. Disconnect circuitry is provided at the 
potential boundaries of sub-words to prevent the 
carry output from propagating across sub-word 
boundaries when the input words are partitioned 
into sub-words. 

In embodiments of the present invention having 
the ability to compute the average of two operands, 
each output bit of the arithmetic logic unit includes 
a multiplexer that implements a right shift operation 
on the results generated by the sub-adders. The 
multiplexer connected to the most significant bit of 
a result connects that bit to the carry bit generated 
by the addition of the most significant bits of the 
corresponding sub-words. The multiplexers con- 
nected to the least significant bit of each result 
may be modified to implement a round-odd round- 
ing scheme to prevent biasing of the average re- 
sults. 

Brief Description of the Drawings 

Figure 1 is a block diagram of an adder ac- 
cording to the present invention. 

Figure 2 is a block diagram of a portion of an 
adder according to the present invention which 
only performs additions. 

Figures 3 is a block diagram of a portion of an 
adder according to the present invention. 

Figure 4 is a block diagram of a multiplexer 
arrangement that may be utilized in constructing 
embodiments of the present invention. 

Detailed Description of the Invention 

The present invention is an improved adder 
that can be configured to perform a plurality of 
partial word adds or averaging operations in par- 
allel. Referring to Figure 1, an adder 10 according 
to the present invention accepts two N-bit operands 
12 and 14. The bits of the first operand will be 
denoted by Xj, for i=0 to N-1, the bits of the 
second operand will be denoted by Y t . The 
operands are typically stored in two of the registers 
in the processor of the computer. However, the 
present invention could be implemented in forms of 
circuitry that are not included in computers. When 
operated as a conventional adder, adder 10 gen- 
erates an N bit output word 16 having the two^ 
complem nt sum of X and Y and a 1-bit carry-out. 
Th bits of utput 16 of add r 10 will be d noted 
by Z| in the following discussion. Th result of the 
addition is typically stored back int n of th 
CPU regist rs. 



In the foil wing discussion, th bits in th var- 
ious words will be numbered fr m least significant 
to most significant. That is, Xo is the least signifi- 
cant bit of operand X, and X^-i is the most signifi- 
5 cant bit of the X operand. The same convention will 
be used for the Y and Z words. 

The present invention allows each of the 
operands to be divided into a plurality of sub- 
words. For simplicity, the present invention will be 
w explained first in terms of a single division of each 
of the operands into partial operands. In this case, 
the first k bits of the X operand, Xo through X*-u 
are the bits of the first partial operand 18 of the X 
word, and the remaining bits, X* through Xn_i are 
is the bits of the second partial operand 17 of the X 
word. The Y operand is similarly divided into partial 
words 19 and 20. In this mode, bits Zo through 
Z k _ t are the bits of the sum of partial operands 18 
and 20, and bits Z k through Z N _ 1 are the bits of the 
20 sum of partial operands 17 and 19, respectively. 
These two results will be referred to as partial or 
sub-word sums in the following discussion. As will 
be explained in more detail below, the present 
invention may also be used to compute the aver- 
ts age value of each of two partial operands. If the 
average is being computed, bits Zo through Z k _! 
are the bits of the average of partial operands 18 
and 20, and bits Z k through Z N _ 1 are the bits of the 
average of partial operands 17 and 19, respec- 
30 tivery. 

The present invention will be explained first in 
terms of an adder constructed from single bit add- 
ing stages which utilize a ripple carry architecture, 
also known as carry propagate architecture. The 

35 manner in which the teachings of the present in- 
vention can be utilized on other adder architectures 
will be discussed in more detail below. Refer now 
to Figure 2 which is a block diagram of a portion of 
an adder 30 according to the present invention 

40 which only performs additions. That is, adder 30 
does not perform averaging calculations. Adder 30 
is constructed from an array of one bit adder 
stages. The present invention differs from a con- 
ventional adder in that the stages may be de- 

45 coupled to allow the adder to perform parallel addi- 
tions on the partial words. Each single bit adder 
adds two bits, one from the X operand and one 
from the Y operand, and a carry bit from the 
previous stage in the adder, denoted by Q for the 

so i th stage, to generate a sum bit and a new carry bit. 
The two stages shown in Figure 2 are the single bit 
adders used to add the most significant bits of 
partial operands 18 and 20 and the least significant 
bits of partial operands 17 and 19. Single bit adder 

55 31, for xampl , adds bits 0^-2, Xk_t and Y k _! t 
g n rat sum bit S k -i and carry bit Ck-i- In th 
following discussion the stag of th add r that 
adds bits Xp and Y p to generate a sum bit S p will 
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b r f rred t as th p th stag of the adder. In a 
conv ntional add r utilizing carry propagation, the 
carry bit fr m ach stag is propagated to the next 
stage by connecting th carry bit input of each 
stage to the carry bit output of the stage before it 
in the array of one bit adders. 

In the present invention, the carry bit from the 
stage just before the boundary separating the two 
partial operands is connected to a blocking circuit 
33. If adder 30 is being used as a conventional 
adder operating on the entire contents of registers 
12 and 14 treated as single words, blocking circuit 
33 connects the carry output of single bit adder 31 
to the carry input of single bit adder 32. If adder 30 
is being used to perform two adds in parallel with 
partial word boundary between bits k-1 and k in 
each register, then blocking circuit 33 merely pre- 
vents the carry bit from single bit adder 31 from 
propagating to the carry input of single bit adder 
32. This is accomplished in response to a 1-bit 
signal M k . The manner in which the bits M k are 
specified will be discussed in more detail below. 
The carry outputs of all of the other single bit 
adders are connected in the conventional manner 
in the remaining stages of adder 30. Hence, carry 
bits propagate in the conventional manner within 
each section of adder 30 that is operating on a 
particular partial operand. The sum bits from each 
adder stage are connected to the corresponding 
bits of the output port. 

In conventional adders, the carry bit from the 
single bit adder operating on the most significant 
bit of the operands is used to detect overflows. If 
this feature is to be implemented for each of the 
partial operands, the carry bit from the most signifi- 
cant bit additions of each of the partial operands is 
connected to an appropriate overflow circuit. In one 
mbodiment of the present invention, the carry bits 
are ORed together and the resultant bit used to 
detect an overflow. This bit can be used to trigger 
a trap on overflow or it may be ORed with the 
contents of a single bit register. In the later case, 
the program can check the contents of the register 
to determine if any operation since the last time the 
register was checked has resulted in an overflow. 

If no averaging computations are to be imple- 
mented, the present invention can be constructed 
from a conventional adder merely by including a 
blocking circuit such as blocking circuit 33 between 
each pair of stages that could span the boundary 
between two partial operands. If the X and Y inputs 
are to be divisible into an arbitrary number of 
partial operands of arbitrary size, then a blocking 
section is included between each pair of singl bit 
add rs. Th blocking sections ar configured by a 
mask having bits M k that specify the location of the 
most significant bit of each partial operand. This 
mask will be r f rred to as th boundary mask in 



th following discussion. Th bits of the mask may 
be stored in a register in the adder or may be 
g n rated directly from the instruction being ex- 
ecuted by th instruction decoding circuitry of th 

5 processor in which the adder is located. 

The above embodiment of the present inven- 
tion performs additions of the operands. Blocking 
circuit 33 replaces the carry bit by 0 during an 
addition in which the adder is divided into sub- 

10 adders with a boundary at the blocking circuit. If 
the adder is to be also used for 2's complement 
subtractions in which the adder is likewise divided, 
the carry bit must be forced to be a 1 instead of a 
0. Blocking circuit 33 shown in Figure 2 imple- 

75 ments both additions and subtractions by providing 
an input F having a value of either "0" or "1". 
When a boundary is active at blocking circuit 33, 
the value of F is the value presented to the next 
stage. If the boundary is inactive, then blocking 

20 circuit 33 merely transmits the carry bit, C k _i, to 
the next stage. 

The above-described embodiments of the 
present invention do not perform averaging com- 
putations. While these embodiments provide sig- 

25 nificant advantages over the prior art, the preferred 
embodiment of the present invention provides addi- 
tional hardware that provides the capability of com- 
puting averages. The additional hardware required 
to perform averaging computations will now be 

30 discussed. The average of two operands is the 
sum of the two operands divided by two. A division 
by two is equivalent to shifting the sum of the two 
operands to the right by one bit. The least signifi- 
cant bit of the sum is lost. The most significant bit 

35 of the average is the carry output of the single bit 
adder operating on the most significant bit of the 
partial operands. For example, the average of par- 
tial operands 18 and 20 can be computed by 
routing Si to for i = 1,...,k-1, discarding So, and 

40 routing C k _i to Z^. 

For any given division of the inputs into partial 
operands, there are three types of stages. The 
stage adding the least significant bits of the partial 
operands discards its sum bit when performing an 

45 average computation. This type of stage will be 
referred to as a least significant bit stage in the 
following discussion. The stage adding the most 
significant bits of a partial operand must route the 
carry bit generated thereby to the most significant 

so bit of the result, and its sum bit to the next most 
significant bit of the result in addition to breaking 
the carry propagation chain. This type of stage will 
be referred to as a most significant bit stage in the 
following discussion. Stages that add bits between 

55 th least and m st significant bits must route th ir 
sum bits to a location one position to the right f 
th position used wh n the stages performed addi- 
tion. That is, th i th stag r ut s its sum bit t Z^. 
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This type of stage will be r f rred to as an int rior 
bit stage in the following discussion. 

Whil embodiments in which th partial 
operand boundari s may be located at arbitrary 
points in the words are possible, in general, it is 
more useful to provide for boundaries at only a few 
positions. For example, if the adder is to be used in 
a computer which normally adds 32 bit words, 
boundaries separating the 4 bytes of such words 
are more useful, since partial operands will typi- 
cally be one or two bytes in such a system. In this 
case, each adder stage must be capable of func- 
tioning in at most two of the three roles described 
above. Stage (N-1) needs only perform as a most 
significant bit stage. Similarly, stage 0 needs only 
to perform as a least significant bit stage. All other 
stages must be capable of operating as interior bit 
stages when the adder is used for full word addi- 
tions. In addition, those stages that can become the 
most or least significant bit stages of a parallel sub- 
word add or average must be capable of switching 
to those configurations. 

The differences between the stages lies in the 
manner in which the sum signals, S if generated by 
each adder stage are routed to the output bits, Zi, 
of the adder rather than by the details of how the 
adder stages generate the sum bits from the bits of 
the X and Y registers. Refer now to Figure 3 which 
is a block diagram of a portion of an adder 100 
according to the present invention. Adder 100 is 
constructed from a plurality of 4-bit adding sections 
of which adding section 102 is exemplary. Adding 
section 102 operates on bits q through q +3 of the 
X and Y registers to generate sum bits S q through 
S q +3. Each adding section is functionally equivalent 
to 4 of the single bit adders described above in 
that each adding section receives 4 bits from the X 
register, 4 bits from the Y register and generates 
the corresponding four sum bits. In addition, adding 
section 102 generates a carry bit C q+3 and re- 
ceives carry bit C q _i. Blocking circuits 110 and 112 
are shown on each side of adding section 102. 

Adder 100 generates N sum bits, Si, for i = 0 to 
N-1. When adder 100 is utilized as a single 
operand adder, the i th sum bit, Si, is routed to the 
i ,h output bit, Zj, of the adder. Each output line Zj is 
connected to a multiplexer. Exemplary multiplexers 
are shown at 121-124 in Figure 3. The multiplexers 
are controlled by an average signal A. If A is false, 
Si is coupled to Z t for i = 0 to N-1. rf A is true, there 
are three possibilities as described above with re- 
spect to the types of stages. Those multiplexers 
that are connected to output lines that can only be 
int rior bits f a result, connect S j+ i to Z h Mul- 
tiplex rs 122 and 123 ar xampl s of such mul- 
tiplexers. 

The second type f muftipl xers are those that 
ar connected t utput lin s that can ith r be an 



internal bit or a most significant bit f a result. 
Such a multiplexer is shown at 121. Th se mul- 
tipl x rs will be r f erred to as most significant bit 
multiplex rs. A most significant bit multiplex r is a 

5 3-to-1 multiplexer which operates in response to A 
and the mask bit that determines if the multiplexer 
is connected to a most significant bit of a result. If 
A is false, then a most significant bit multiplexer 
connects Z f to S } . rf A is true and the mask bit is 

10 also true, then a most significant bit multiplexer 
connects Zi to Q, where d is the carry bit gen- 
erated by the stage that generated S } . If A is true 
and the mask bit is false, then the most significant 
bit multiplexer connects Z, to S )+1 . ft should be 

rs noted that the multiplexer connected to Z H - A is a 
most significant bit multiplexer in which a 2-to-1 
multiplexer is sufficient, since this multiplexer can 
never be an interior multiplexer. 

The third type of multiplexers are those that 

20 are connected to output lines that can either be an 
internal bit or a least significant bit of a result. Such 
a multiplexer is shown at 124. These will be re- 
ferred to as least significant bit multiplexers. If the 
result of the average operation is to be truncated, 

25 i.e., the bit shifted out is to be ignored in determin- 
ing the result, a least significant bit multiplexer is 
the same as an internal bit multiplexer. 

While truncation may be used as a method of 
rounding the result of an integer divide by shifting 

30 right, it can cause undesirable problems that can 
be prevented by other forms of rounding. Consider 
the case in which an image is to be reduced in size 
by averaging adjacent pixels in the image. That is, 
each group of 4 pixels is to be replaced by one 

35 pixel having a value equal to the average of the 4 
pixels replaced. The image is represented by an bet 
pixel array. The pixel reduction can be accom- 
plished by averaging the odd and even rows in the 
array to generate an lxl/2 pixel array. The odd and 

40 even columns of this intermediate array are then 
averaged to generate the final (l/2)x(l/2) array. If the 
various averaging operations always round down, 
the final image will have different statistical prop- 
erties than the original image. For example, the 

45 reduced image will have a lower average light 
intensity then the original array, ft is often important 
that this type of artifact be avoided. The present 
invention provides a means for avoiding this type 
of biased rounding. 

so In the preferred embodiment of the present 
invention, round odd logic is used to prevent this 
biasing. In round odd systems, the result is round- 
ed to the nearest odd integer if a round off error is 
created by th right shift. If th answer is xact 

55 befor r unding, n chang is mad . A round ff 
error occurs when v r a 1 is shifted off the result. 
This occurs wh n th least significant bit befor 
shifting is a "1". In this case, the least significant 
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bit of the result is set t a "1 \ If the bit shifted out 
is a "0", then th result was exact even after the 
shift, and th least significant bit of the result is 
unalt red. Th present invention utilizes round odd 
because a round odd system can be implemented 
by the addition of a single OR gate 131 to the 
multiplexer. Hence, in a least significant bit mul- 
tiplexer in which unbiased rounding is used, the 
multiplexer selects the signal (S i+ i OR Sj) for con- 
nection to Z| when A is true and the mask bit 
indicates that the multiplexer in question is the 
least significant bit of a result. When A is false, Sj 
is connected to Z } . When A is true and the mask bit 
indicates that the output line is connected to an 
internal bit of a result, the multiplexer connects Z f 
to S i+1 . 

It can be shown that the average error obtained 
with this form of rounding is zero provided the S 
values are uniformly distributed. It should be noted 
that a round even system also prevents biasing in 
rounding. In a round even system, the result is 
rounded to the nearest even integer if a round off 
rror occurs and the result prior to rounding was 
odd. However, the hardware needed to implement 
a round even scheme is significantly more com- 
plex. 

The above-described multiplexers assumed 
that only a limited number of potential partial 
operand boundaries are implemented. If however, 
the adder is to have arbitrary boundaries, then the 
multiplexer attached to each output line, other than 
the 0 s1 and (N-1) th stages, must be capable of 
functioning any of the three possible multiplexer 
functions. A block diagram of a multiplexer ar- 
rangement 200 that provides these functions is 
shown in Figure 4. Multiplexer arrangement 200 is 
constructed from a 4-to-1 multiplexer 201. It is 
assumed that the multiplexer is connected to out- 
put line Zq in the following discussion. Multiplexer 
201 either couples S q+ i, S q , C q or (Sq+i OR Sq) to 
Zq depending on the state of the average signal 
and mask bits. Since multiplexer 201 must function 
properly both as a least significant bit multiplexer 
or a most significant bit multiplexer, it must be able 
to determine the state of both M q and M q+1 . 

In addition to providing parallel addition or 
averaging in an adder having a complexity only 
slightly greater than a conventional adder, the 
present invention also improves the overall com- 
putational efficiency of many types of computation 
in which the data is packed into words at the 
beginning of the computation. Consider the image 
size reduction problem discussed above as it 
w uld be carried out on a comput r having a 32-bit 
word siz . If ach pix I is represented by a on 
byte integer, the image data will typically be 
packed into words to sav storag space. H nee, 
each horizontal lin in th imag will be r pre- 



sented by 1/4 32-bit words. To add th pixels of an 
odd numbered line to those of the ev n numbered 
line following it, a word containing 4 pix Is f th 
odd numbered lin can be input to th X regist r 

5 and the corresponding word from the even num- 
bered line input to the Y register. The contents of 
the Z register can then be stored directly to mem- 
ory as one packed word of the resultant image line. 
This operation can be carried out without having to 

w unpack the individual bytes and repack the result. 
Hence, the present invention allows 4 pixels to be 
processed with a single add instruction and avoids 
the packing and unpacking instructions needed 
with conventional adders. Finally, the present in- 

75 vention automatically generates the average of the 
two pixels, thereby eliminating one shift operation 
after each add. 

Furthermore, the parallel sub-word add instruc- 
tion is no more complex than a conventional full 

20 word add instruction. In a conventional full word 
add, the programmer specifies two registers having 
the operands and a third register to hold the result. 
The present invention, likewise, requires only the 
specification of three registers. If only a few possi- 

25 ble configurations of the adder in partial operand 
fields are possible, these configurations can be 
included in the add instruction and the instruction 
decode circuitry can then generate the correspond- 
ing mask bits. Hence, no additional instructions are 

30 needed to setup the adder. 

The above discussion assumed that the input 
and output words were stored in processor regis- 
ters. Embodiments of the present invention in 
which the output, and/or input, values are sent to or 

35 received from locations specified in the add or 
average instructions will be apparent to those 
skilled in the art. For example, the add instruction 
could specify a plurality of registers to receive the 
sums, each sum being loaded in the least signifi- 

40 cant bits of the register. Similarly, the result could 
be sent directly to locations other than register, e.g. 
memory or other functional units. In addition, it will 
be apparent that the operands could originate from 
other functional units or memory. 

45 While the above embodiments have been de- 
scribed in terms of parallel partial sub-word averag- 
ing, it will be apparent to those skilled in the art 
that the present invention may also be used to 
perform an averaging instruction on the entire con- 
so tents of the X and Y words. Such an embodiment 
provides the benefit of two operations, i.e. f add and 
divide by two, in one machine cycle. 

The above embodiments of the present inven- 
tion hav been described in t rms of carry propa- 

55 gation type add rs. In th pr ferred mbodim nt f 
the present inv ntion, carry-look ahead architecture 
is used because it has small r d lays. In carry 
look-ahead add rs, th carry generation circuitry 
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produc s a propagat and a generat signal cor- 
responding to each bit of the adder. These signals 
may be used in a mann r analogous to th carry 
bits d scribed abov to allow the add r t be 
broken into parallel sub-word adders. Consider the 
case in which the adder is to be divided such that 
stage k operates on the most significant bit of the 
sub-word result. A blocking circuit such as blocking 
circuit 33 shown in Figure 2 can be inserted into 
the carry generation logic such that the propagate 
bit and the generate bit are forced to the appro- 
priate values depending on the type of operation, 
i.e., addition or subtraction, that is being performed. 
When the adder is being used on words that are 
not broken at stage k, the blocking circuit does not 
alter the values of the propagate and generate bits 
corresponding to stage k. 

While the above embodiments of the present 
invention have been described in terms of instruc- 
tions as the means for triggering the various oper- 
ations performed by the present invention, it will be 
apparent to those skilled in the art that the oper- 
ations of the present invention may be triggered by 
electrical signals that are not generated by instruc- 
tions of a stored computer program. Hence, the 
term "instruction" when used in the claims shall 
also include operations triggered by other forms of 
signaling. Furthermore, it will be apparent to those 
skilled in the art that the present invention may be 
used in circuitry that is not part of a computer. 

While the above embodiments of the present 
invention have been described in terms of averag- 
ing unsigned integers, it will be apparent to those 
skilled in the art that the teaching of the present 
invention may be applied to signed integers. In this 
case, instead of moving the carry out bit from the 
most significant bit of the sum into the most signifi- 
cant bit of the result, the later is determined as 
follows: If the two operands are of the same sign, 
then the most significant bit of the result is the 
carry out bit as before. If the two operands are of 
different signs, then the most significant bit of the 
result is the most significant bit of the sum prior to 
the right shift. That is, the most significant bit of the 
result will be (Xp AND Y p ) OR (Xp OR Y p ) AND 
Cp_i. In the case of a carry look-ahead adder, this 
expression reduces to G p OR P p AND Cp_i, where 
G p and P p are the generate and propagate bits 
associated with the most significant bit. 

Various modifications to the present invention 
will become apparent to those skilled in the art 
from the foregoing description and accompanying 
drawings. Accordingly, the present invention is to 
be limited sol ly by th scope of the following 
claims. 



Claims 

1. An apparatus for Derating on th cont nts of 
an X word having bits X) and a Y word having 
5 bits Y,; to generate a result word having bits Z,, 

where i = 0 to N-1 , where Zo is the least signifi- 
cant bit of one of said sub-words and Z N -i is 
the most significant bit of one of said sub- 
words, said apparatus comprising: means for 
io partitioning said X, Y and result words into a 

plurality of sub-words, there being one sub- 
word of said Y and result words corresponding 
to each sub-word of said X word; means, re- 
sponsive to a first instruction, for generating 
is the sum of each X sub-word and the cor- 

responding Y sub-word, the result thereof de- 
termining said corresponding sub-word of said 
result word; and means, responsive to a third 
instruction, for generating the sum divided by 
20 two of each sub-word in said X word and the 

corresponding sub-word in said Y word, the 
result thereof determining said corresponding 
sub-word of said result word. 

25 2. The apparatus of Claim 1 wherein each of said 
results is rounded to the next highest odd 
integer if said division by two resulted in a 
round off error. 

30 a An apparatus for operating on the contents of 
an X word having bits X, and a Y word having 
bits Y| to generate a result word having bits Z h 
where i = 0 to N-1 , where Zo is the least signifi- 
cant bit of one of said subwords and Z N _t is 
35 the most significant bit of one of said sub- 

words, said apparatus comprising: means for 
partitioning said X, Y and result words into a 
plurality of sub-words, there being one sub- 
word of said Y and result words corresponding 
40 to each sub-word of said X word; means, re- 

sponsive to a first instruction, for generating 
the sum of each X sub-word and the cor- 
responding Y sub-word, the result thereof de- 
termining said corresponding sub-word of said 
45 result word; and means, responsive to a sec- 

ond instruction, for generating the difference of 
each sub-word in said X word and the cor- 
responding sub-word in said Y word, the result 
thereof determining said corresponding sub- 
so word of said result word; and means, respon- 
sive to a fourth instruction, for generating the 
difference divided by two of each sub-word in 
said X word and the corresponding sub-word 
in said Y word, th result th reof d termining 
55 said corresponding sub-word of said result 
word. 
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4. The apparatus of Claim 1 wher in said appara- 
tus comprises m ans for operating on one or 
more bits from said X word and corr spending 
bits d rived from said Y word, and means for 
generating a generate bit and a propagate bit s 
corresponding to each said bit of said X word, 
and wherein said partitioning means comprises 
means for forcing said generate bit and said 
propagate bit to values determined by the op- 
eration being performed if said bit of said X 10 
word is the most significant bit of a sub-word. 



9. The apparatus f Claim 8 wher in said Z word 
is rounded to the next highest odd integer if 
said shift resulted in a round ft error. 

10. The apparatus of Claim 8 further comprising 
means, responsive to a difference instruction, 
for generating the difference divided by two of 
said X word and said Y word, the result thereof 
determining said result word. 



5. The apparatus of Claim 1 wherein said appara- 
tus comprises a plurality of adding stages con- 
nected in series, each said adding stage com- 75 
prising means for operating on one or more 

bits from said X word and corresponding bits 
derived from said Y word, and means for prop- 
agating a carry bit from said adding stage to 
the next said adding stage in said series con- 20 
nection, and wherein said partitioning means 
comprises means for forcing said carry bit to a 
value determined by the operation being per- 
formed if said adding stages operate on bits of 
different sub-words in said X word. 25 

6. The apparatus of Claim 5 wherein said adding 
stages generate N result bits, S ip for i-0 to N- 
1, S| being derived from operations on X), and 
wherein said apparatus further comprises N 30 
multiplexing means, each said multiplexing 
means generating a signal indicative of one of 

said result bits, there being one said multiplex- 
ing means for each said result bit Z p , for p = 0 
to N-1 , said multiplexing means corresponding 35 
to Z p connecting S p to Zp in response to said 
first instruction. 



7. The apparatus of Claim 6 wherein said mul- 
tiplexing means corresponding to Z p connects 40 
Z p to Sp*! in response to said third instruction 
if Z p is interior to one of said sub-words and to 
C p if Zp is the most significant bit of one of 
said result sub-words, wherein C p is said carry 
bit generated by said adding stage operating 45 
on the most significant bit of said X sub-word 
corresponding to said result sub-word contain- 
ing Zp. 

& An apparatus for operating on the contents of 50 
an X word having bits X| and a Y word having 
bits Y| to generate a result word having bits Z l( 
where i = 0 to N-1, where Zo is the least signifi- 
cant bit and Z H - y is the most significant bit, 
said apparatus comprising: means, responsiv 55 
to an av rage instruction, for generating the 
sum of said X and Y words; and means for 
shifting said sum to gen rat said Z word. 
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